Function approximation (FA) has been a critical component in solving large zero-sum games. Yet, little attention has been given towards FA in solving \textit{general-sum} extensive-form games, despite them being widely regarded as being computationally more challenging than their fully competitive or cooperative counterparts. A key challenge is that for many equilibria in general-sum games, no simple analogue to the state value function used in Markov Decision Processes and zero-sum games exists. In this paper, we propose learning the \textit{Enforceable Payoff Frontier} (EPF) -- a generalization of the state value function for general-sum games. We approximate the optimal \textit{Stackelberg extensive-form correlated equilibrium} by representing EPFs with neural networks and training them by using appropriate backup operations and loss functions. This is the first method that applies FA to the Stackelberg setting, allowing us to scale to much larger games while still enjoying performance guarantees based on FA error. Additionally, our proposed method guarantees incentive compatibility and is easy to evaluate without having to depend on self-play or approximate best-response oracles.
translated by 谷歌翻译
Correlated Equilibrium is a solution concept that is more general than Nash Equilibrium (NE) and can lead to outcomes with better social welfare. However, its natural extension to the sequential setting, the \textit{Extensive Form Correlated Equilibrium} (EFCE), requires a quadratic amount of space to solve, even in restricted settings without randomness in nature. To alleviate these concerns, we apply \textit{subgame resolving}, a technique extremely successful in finding NE in zero-sum games to solving general-sum EFCEs. Subgame resolving refines a correlation plan in an \textit{online} manner: instead of solving for the full game upfront, it only solves for strategies in subgames that are reached in actual play, resulting in significant computational gains. In this paper, we (i) lay out the foundations to quantify the quality of a refined strategy, in terms of the \textit{social welfare} and \textit{exploitability} of correlation plans, (ii) show that EFCEs possess a sufficient amount of independence between subgames to perform resolving efficiently, and (iii) provide two algorithms for resolving, one using linear programming and the other based on regret minimization. Both methods guarantee \textit{safety}, i.e., they will never be counterproductive. Our methods are the first time an online method has been applied to the correlated, general-sum setting.
translated by 谷歌翻译
Estimating causal effects has become an integral part of most applied fields. Solving these modern causal questions requires tackling violations of many classical causal assumptions. In this work we consider the violation of the classical no-interference assumption, meaning that the treatment of one individuals might affect the outcomes of another. To make interference tractable, we consider a known network that describes how interference may travel. However, unlike previous work in this area, the radius (and intensity) of the interference experienced by a unit is unknown and can depend on different sub-networks of those treated and untreated that are connected to this unit. We study estimators for the average direct treatment effect on the treated in such a setting. The proposed estimator builds upon a Lepski-like procedure that searches over the possible relevant radii and treatment assignment patterns. In contrast to previous work, the proposed procedure aims to approximate the relevant network interference patterns. We establish oracle inequalities and corresponding adaptive rates for the estimation of the interference function. We leverage such estimates to propose and analyze two estimators for the average direct treatment effect on the treated. We address several challenges steaming from the data-driven creation of the patterns (i.e. feature engineering) and the network dependence. In addition to rates of convergence, under mild regularity conditions, we show that one of the proposed estimators is asymptotically normal and unbiased.
translated by 谷歌翻译
组织了伽马挑战赛,以鼓励AI模型从2D眼睛图像和3D光学相干断层扫描量的组合(如眼科医生)中筛选出青光眼。
translated by 谷歌翻译
在多种方案中,多幕科建议专门为用户检索相关项目,这在工业推荐系统中无处不在。这些方案享有用户和项目中的一部分重叠,而不同方案的分布则不同。多阶段建模的关键点是有效地最大程度地利用全幕纳罗来信息,并在多种情况下为用户和项目生成适应性表示。我们总结了三个实用挑战,这些挑战无法很好地解决多幕科建模:(1)在多种情况下缺乏细粒度和脱钩的信息传输控制。 (2)整个空间样品的开发不足。 (3)项目的多幕科代表性分解问题。在本文中,我们提出了一种情景自适应和自我监督(SASS)模型,以解决上述三个挑战。具体而言,我们使用场景自适应门单元设计了多层场景自适应转移(ML-SAT)模块,以相当细粒度且脱钩的方式选择并融合从整个场景到单个场景的有效传输信息。为了充分利用整个空间样品的功能,引入了包括预训练和微调在内的两阶段训练过程。预训练阶段是基于场景监督的对比学习任务,并从标记和未标记的数据空间中绘制的培训样本。该模型是在用户端和项目方面对称创建的,因此我们可以在不同情况下获得项目的区分表示。公共和工业数据集的广泛实验结果证明了SASS模型比最先进的方法的优越性。该模型还可以在在线A/B测试中平均每位用户的观看时间提高8.0%以上。
translated by 谷歌翻译
小鼠的自动社会行为分析已成为行为神经科学中越来越流行的研究领域。最近,已使用姿势信息(即关键点或骨骼的位置)来解释小鼠的社会行为。然而,很少在现有方法中研究了小鼠关键点基础的社会互动信息的有效编码和解码。特别是,由于高度变形的身体形状和模棱两可的运动模式,建模小鼠之间复杂的社交互动是一项挑战。为了处理交互建模问题,我们在这里提出了一个跨骨骼相互作用图聚合网络(CS-IGANET),以学习自由相互作用的小鼠的丰富动力学,其中使用了跨骨骼节点级交互模块(CS-NLI)建模多级相互作用(即内部,间和跨骨骼相互作用)。此外,我们设计了一种新颖的互动感知变压器(IAT),以动态学习社交行为的图形表示,并更新节点级表示,并在我们提出的互动意识到的自我注意力下的机制的指导下。最后,为了增强我们的模型的表示能力,提出了辅助自我监督的学习任务来衡量跨骨骼节点之间的相似性。标准CRMI13-SKERTON和我们的PDMB-Skeleton数据集的实验结果表明,我们所提出的模型的表现优于其他几种最先进的方法。
translated by 谷歌翻译
青光眼会导致视力神经损害导致不可逆的视力丧失,并且无法治愈青光眼。OCT成像方式是评估青光眼损害的重要技术,因为它有助于量化底底结构。为了促进对青光眼的OCT辅助诊断领域中对AI技术的研究,我们在国际医学图像计算和计算机辅助干预(MICCAI)2022的国际会议上进行了青光眼OCT分析和层分段(目标)挑战(目标)挑战(目标)挑战。提供数据和相应的注释,以研究从OCT图像和青光眼分类研究层分割的研究人员。本文介绍了已发布的300个圆形八十个OCT图像,两个子任务的基线以及评估方法。可以在https://aistudio.baidu.com/aistudio/competition/competition/detail/230上访问目标挑战。
translated by 谷歌翻译
许多会议依靠纸质招标作为其审阅者分配程序的关键组成部分。然后在分配审阅者时考虑这些投标,以帮助确保将每个审查员分配给合适的论文。但是,尽管使用投标的好处,但依靠纸质招标可以使恶意审稿人以不道德的目的操纵纸质作业(例如,被分配给朋友的纸张)。已经提出和部署了几种防止这种操作的方法。在本文中,我们列举了某些理想的特性,这些算法应满足解决投标操纵的算法。然后,我们对各种方法以及未来研究的指示提供了高级分析。
translated by 谷歌翻译
交通信号控制(TSC)是一个高风险域,随着交通量在全球的增长而增长。越来越多的作品将加固学习(RL)应用于TSC;RL可以利用大量的流量数据来提高信号效率。但是,从未部署基于RL的信号控制器。在这项工作中,我们提供了对TSC进行RL之前必须解决的挑战的首次审查。我们专注于四个涉及(1)检测不确定性的挑战,(2)通信的可靠性,(3)合规性和解释性以及(4)异构道路使用者。我们表明,基于RL的TSC的文献在应对每个挑战方面取得了一些进展。但是,更多的工作应采用系统思维方法,以考虑其他管道组件对RL的影响。
translated by 谷歌翻译
多项式增强学习(MARL)最近的许多突破都需要使用深层神经网络,这对于人类专家来说是挑战性的解释和理解。另一方面,现有的关于可解释的强化学习(RL)的工作在从神经网络中提取更可解释的决策树政策方面显示了有望,但仅在单一机构设置中。为了填补这一空白,我们提出了第一组算法,这些算法从接受MARL训练的神经网络中提取可解释的决策策略。第一种算法IVIPER将Viper扩展到了单代代理可解释的RL的最新方法到多代理设置。我们证明,艾维尔(Iviper)学习每个代理商的高质量决策树政策。为了更好地捕捉代理之间的协调,我们提出了一种新型的集中决策树培训算法,Maviper。 Maviper通过使用其预期的树来预测其他代理的行为,并使用重新采样来集中精力,以重点放在对其与其他代理相互作用至关重要的状态上,从而共同生长了每个代理的树木。我们表明,这两种算法通常都优于基础线,而在三种不同的多代理粒子世界环境上,受过iviper训练的药物比iviper训练的药物获得了更好的协调性能。
translated by 谷歌翻译